Indexing Weblogs One Post at a Time
نویسنده
چکیده
In order to perform analysis over weblogs, we must first identify the appropriate unit of a weblog that corresponds to a document. We argue in the paper that, for weblogs, the correct unit is the weblog post. A weblog post is a structured document with the following fields: date, timestamp, title, content, permalink and author. We present our approach for segmenting weblogs into posts, which breaks down into several steps: (1) automatic feed discovery; (2) feed-guided segmentation, using the weblog feed and HTML; and (3) modelbased weblog segementation.
منابع مشابه
BlogPulse: Automated Trend Discovery for Weblogs
Over the past few years, weblogs have emerged as a new communication and publication medium on the Internet. In this paper, we describe the application of data mining, information extraction and NLP algorithms for discovering trends across our subset of approximately 100,000 weblogs. We publish daily lists of key persons, key phrases, and key paragraphs to a public web site, BlogPulse.com. In a...
متن کاملAdaptive Weblog Post Filtering Based on User Browsing History
One of the most important Web-based services that established the foundations of the Web 2.0 is the weblog. Weblogs are evolving to be topic based systems that can lead to more revenue for companies. Therefore many companies provide free weblog hosting. Weblog popularity is an effective factor to gain more revenue. Weblogs have posts and topics that are arranged chronologically with the most re...
متن کاملMapping the Blogosphere in America
This short paper constitutes the first phase of a long-term project focused on probing American urban culture by examining the hyperlinks and text of personal weblogs. It discusses methods of extracting geographic location information from weblogs and ways of indexing weblogs to city units. After a brief introduction to the broader research plan, the paper proposes a process to automatically ex...
متن کاملE-Tools to Assist EFL Learners' Writing Skill: Wikis, Weblogs, and Podcasts
One of the promises of web-based education is to help students take control of their learning pace as the basic requirement of language learning is being life-long. The purpose of the present study was to find out which of the e-tools -- weblogs, wikis, or podcasts -- can better help EFL learners excel in their writing skill. To this end, 156 Iranian sophomore students majoring in English and s...
متن کاملBlogs Search Engine Using RSS Syndication and Fuzzy Parameters
The rapid development of the internet eventually increases the number of internet users triggering the need for an intelligent search engine that is able to minimize the search on world wide web (WWW) and find relevant information as requested. To overcome the issue of finding relevant information as well as minimizing the search on WWW, this paper proposes a search engine that is specifically ...
متن کامل